Model Selection

Self-supervised pre-training

# Self-supervised pre-training

RNAErnie is a model for self-supervised pre-training based on non-coding RNA sequences. It uses a multi-stage masked language modeling objective to provide powerful feature representation capabilities for RNA research.

Molecular Model

Prophetnet Large Uncased

ProphetNet is a sequence-to-sequence pre-trained language model that employs a self-supervised objective of future n-gram prediction, capable of predicting more future tokens

Large Language Model English

Mahadhwani Pretrained Conformer

A pre-trained Conformer encoder model based on self-supervised learning, supporting automatic speech recognition tasks for 22 scheduled Indian languages.

Speech Recognition

Large-scale general-purpose audio encoder trained via self-supervised learning, capable of processing multi-domain audio information including speech, music, and environmental sounds

Audio Classification

GPT-2 is a self-supervised pre-trained language model based on the Transformer architecture, which excels at text generation tasks.

Large Language Model

demo-leaderboard

Regnety 1280.seer

RegNetY-128GF feature extraction model, pre-trained using the SEER method on 2 billion random web images with self-supervised learning

Image Classification

Convnextv2 Pico.fcmae

ConvNeXt-V2 self-supervised feature representation model, pre-trained using the Fully Convolutional Masked Autoencoder (FCMAE) framework, suitable for image classification and feature extraction tasks.

Image Classification

Convnextv2 Large.fcmae

A self-supervised feature representation model based on ConvNeXt-V2, utilizing the Fully Convolutional Masked Autoencoder (FCMAE) framework for pre-training, suitable for image classification and feature extraction tasks.

Image Classification

Vit Msn Large 7

This Vision Transformer model is pre-trained using the MSN method and excels in few-shot scenarios, suitable for tasks like image classification

Image Classification

Swinv2 Small Patch4 Window8 256

Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Viwav2vec2 Base 3k

This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.

Speech Recognition

Transformers Other

Regnet Y 640 Seer In1k

RegNet model trained on imagenet-1k, pre-trained in a self-supervised manner on billions of random web images before fine-tuning

Image Classification

XLM-Align is a pre-trained cross-lingual model supporting 94 languages, improving pre-trained cross-lingual models through self-labeled word alignment.

Large Language Model

Beit Large Patch16 512

BEiT is a vision Transformer-based image classification model, pre-trained in a self-supervised manner on ImageNet-21k and fine-tuned on ImageNet-1k.

Image Classification

Wav2vec2 Large 960h Lv60 Self

The Wav2Vec2 large model developed by Facebook, pre-trained and fine-tuned on 960 hours of Libri-Light and Librispeech audio data, using self-training objectives, achieving SOTA results on the LibriSpeech test set.

Speech Recognition English

Beit Large Finetuned Ade 640 640

BEiT is an image segmentation model based on Vision Transformer architecture, achieving efficient semantic segmentation through self-supervised pre-training and fine-tuning on the ADE20k dataset.

Image Segmentation

Beit Base Patch16 224

BEiT is a vision model based on image transformers, employing a BERT-like self-supervised pre-training method. It is first pre-trained and fine-tuned on ImageNet-22k, then further fine-tuned on ImageNet-1k.

Image Classification

Beit Base Patch16 224

BEiT is a Vision Transformer-based model pre-trained on ImageNet-21k through self-supervised learning and fine-tuned on ImageNet-1k for image classification tasks.

Image Classification

ProtT5-XL-BFD is a self-supervised pre-trained model based on protein sequences, using the T5 architecture, trained on 2.1 billion protein sequences for protein feature extraction and downstream task fine-tuning.

Prot T5 Xl Uniref50

A protein sequence pre-training model based on T5-3B architecture that captures biophysical properties through self-supervised learning

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase